Skip to content

Conversation

@0marperez
Copy link
Contributor

@0marperez 0marperez commented Oct 17, 2025

Issue #

N/A

Description of changes

  • Project setup
  • Publishing config
  • Transfer manager client
  • Business metric
  • Transfer interceptors
  • Upload file operation
    • Concurrent uploads
    • MPU part buffering
    • Code generated IO
    • Code generated type converters

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@github-actions
Copy link

A new generated diff is ready to view.

  • No codegen difference in the AWS SDK

@0marperez 0marperez added the no-changelog Indicates that a changelog entry isn't required for a pull request. Use sparingly. label Oct 17, 2025
@github-actions
Copy link

A new generated diff is ready to view.

  • No codegen difference in the AWS SDK

@github-actions
Copy link

A new generated diff is ready to view.

  • No codegen difference in the AWS SDK

@github-actions
Copy link

A new generated diff is ready to view.

  • No codegen difference in the AWS SDK

@github-actions
Copy link

A new generated diff is ready to view.

  • No codegen difference in the AWS SDK

@0marperez 0marperez marked this pull request as ready for review October 17, 2025 15:14
@0marperez 0marperez requested a review from a team as a code owner October 17, 2025 15:14
Copy link
Member

@lauzadis lauzadis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice start

public enum class AwsBusinessMetric(public override val identifier: String) : BusinessMetric {
S3_EXPRESS_BUCKET("J"),
DDB_MAPPER("d"),
S3_TRANSFER("G"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two new features S3_TRANSFER_UPLOAD_DIRECTORY and S3_TRANSFER_DOWNLOAD_DIRECTORY we should add here

subprojects {
group = "aws.sdk.kotlin"
version = hllPreviewVersion
version = if (name == "s3-transfer-manager") sdkVersion else hllPreviewVersion
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can override the version in the subproject rather than hacking it in here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or better, override / set hllPreviewVersion in the submodules which need it, and use sdkVersion as the default

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried both options but it resulted in too much boilerplate in non sdkVersion modules, basically every module in hll. Decided to keep it like this for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, instead of setting hllPreviewVersion in all hll modules, can you set sdkVersion in s3-transfer-manager? That should only need to be done in one place and removes this hack

*/

description = "S3 Transfer Manager for the AWS SDK for Kotlin"
extra["displayName"] = "AWS :: SDK :: Kotlin :: HLL :: S3TransferManager"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use spaces in the display name, S3 Transfer Manager

commonMain {
dependencies {
implementation(project(":aws-runtime:aws-http"))
implementation(libs.s3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of depending on an already-published version of S3, we should require S3 to be bootstrapped locally for S3TM to build. Like how we did it for DDB Mapper

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an intentional choice to help prevent changes to S3 from breaking the S3 TM. We don't need to have the latest APIs from S3.

Copy link
Contributor Author

@0marperez 0marperez Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it would happen often, but I think either approach is fine. The current one is just slightly safer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't that mean that S3TM will always depend on the n-1 S3 client? So S3TM-1.2.3 uses S3-1.2.2? That doesn't seem right.

We want to catch changes to S3 which break S3TM and fix them as soon as possible. All of our modules are a comprehensive platform and should use the same, unified set of dependency versions.

S3TransferManager.Companion {
client = s3Client
}.uploadFile {
bucket = "aoperez"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests should either be made generic or not committed yet

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see they have an @Ignore annotation, that seems fine too

* Builds a low-level S3 upload part request from a high-level upload file request
* and data from the S3 Transfer Manager.
*/
internal fun buildUploadPartRequest(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: This and the function below might be better as extension functions

/**
* An interceptor that emits the S3 Transfer Manager business metric
*/
internal object BusinessMetricInterceptor : HttpInterceptor {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming: add S3 / S3 Transfer Manager somewhere in the name for better IDE search discovery


internal fun build(): S3TransferManager =
S3TransferManager(
client = client?.withConfig { interceptors += BusinessMetricInterceptor } ?: error("client must be set"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correctness: the underlying client will emit a business metric for every request, whether the S3TM was used or not. Is that what we want?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

withConfig makes a copy of the client with some overrides as far as I understand

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right

public suspend fun uploadFile(uploadFileRequest: UploadFileRequest): Deferred<UploadFileResponse> = coroutineScope {
val multiPartUpload = uploadFileRequest.contentLength >= multipartUploadThreshold
val uploadedParts = mutableListOf<CompletedPart>()
var mpuUploadId = "null"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels better as a lateinit var

operationHook(TransferInitiated) {
if (multiPartUpload) {
context.response = client.createMultipartUpload(context.request as CreateMultipartUploadRequest)
mpuUploadId = (context.response as CreateMultipartUploadResponse).uploadId ?: throw Exception("Missing upload id in create multipart upload response")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's throw a better Exception here and elsewhere

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a scenario the user can cause themselves, is it? This could only happen because of a bug in our S3TM implementation it seems. If so, I'm generally fine with non-specific errors since we cannot expect users to catch them and act on them.

public val checksumCrc32: String?,
public val checksumCrc32C: String?,
public val checksumCrc64Nvme: String?,
public val checksumSha1: String?,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness: The builder shouldn't be passing all of its values to the class's constructor. The constructor should accept a builder object and set its own fields that way. This results in less boilerplate:

public class UploadFileRequest(builder: UpdateFileRequest.Builder) {
    public val acl = builder.acl
    public val body = builder.body
    public val bucket = builder.bucket
    // ...etc...

    public companion object {
        public operator fun invoke(block: Builder.() -> Unit): UploadFileRequest =
            Builder().apply(block).build()
    }

    public class Builder {
        public var acl: ObjectCannedAcl? = null
        public var body: ByteStream? = null
        public var bucket: String? = null
        // ...etc...

        internal fun build(): UploadFileRequest = UploadFileRequest(this)
    }
}

We follow this pattern in our codegenned shapes and many of our handwritten ones.

Comment applies to other types in this PR.

public var tagging: String? = null
public var websiteRedirectLocation: String? = null

public fun build(): UploadFileRequest =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Generally our build methods are internal because we don't want users to think they have to invoke it manually inside of a DSL block:

val req = UploadFileRequest {
    bucket = "foo"
    key = "bar"
    build() // <-- this is unnecessary
}

import aws.smithy.kotlin.runtime.content.ByteStream
import aws.smithy.kotlin.runtime.time.Instant

public class UploadFileRequest private constructor(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: These classes, their builders, their converters, etc. all seem like a lot of repeated boilerplate. Why aren't we code-generating these based on the list of fields from the spec?

Comment on lines 23 to 32
/**
* Determines the actual part size to use for a multipart S3 upload.
*
* This function calculates the part size based on the total size
* of the file and the requested part size. If the requested part size is
* too small to allow the upload to fit within S3's 10,000-part limit, the
* part size will be automatically increased so that exactly 10,000 parts
* are uploaded.
*/
internal fun resolvePartSize(uploadFileRequest: UploadFileRequest, tm: S3TransferManager, logger: Logger): Long {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This function doesn't feel like it belongs in this file. Nothing else in UploadFile.kt uses it, only S3TransferManager.kt.

val targetNumberOfParts = uploadFileRequest.contentLength / tm.targePartSize
return if (targetNumberOfParts > MAX_NUMBER_PARTS) {
ceilDiv(uploadFileRequest.contentLength, MAX_NUMBER_PARTS).also {
logger.debug { "Target part size is too small to meet the 10,000 S3 part limit. Increasing part size to $it" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: 10000 is already a constant in this file. Reuse that constant in this string rather than duplicating it. This'll reduce the likelihood that the values get out of sync.

Comment on lines 39 to 42
/**
* High level utility for managing transfers to Amazon S3.
*/
public class S3TransferManager private constructor(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: This class has too much logic inside of it and that won't be sustainable once we add more operations and variants. We should refactor this, possibly so that individual operations are located in a single file and maybe even represented by a class.

* all parts to fit within S3's limit of 10,000 parts, the part size will be
* automatically increased so that exactly 10,000 parts are uploaded.
*/
public suspend fun uploadFile(uploadFileRequest: UploadFileRequest): Deferred<UploadFileResponse> = coroutineScope {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: Method is too long and should be refactored.

* all parts to fit within S3's limit of 10,000 parts, the part size will be
* automatically increased so that exactly 10,000 parts are uploaded.
*/
public suspend fun uploadFile(uploadFileRequest: UploadFileRequest): Deferred<UploadFileResponse> = coroutineScope {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness: This method shouldn't return Deferred<UploadFileResponse>, it should return UploadFileResponse just like our low-level APIs do. If users want this to happen asynchronously, they can wrap it in async { } themselves.

Note that coroutineScope does not return until all the work inside is completed (including child coroutines) so the Deferred value would already be filled anyway.

Comment on lines 26 to 50
public interface TransferInterceptor {
// Transfer initialization hooks
public fun readBeforeTransferInitiated(context: TransferContext) {}
public fun modifyBeforeTransferInitiated(context: TransferContext): TransferContext = context
public fun readAfterTransferInitiated(context: TransferContext) {}
public fun modifyAfterTransferInitiated(context: TransferContext): TransferContext = context

// Byte transferring hooks
public fun readBeforeBytesTransferred(context: TransferContext) {}
public fun modifyBeforeBytesTransferred(context: TransferContext): TransferContext = context
public fun readAfterBytesTransferred(context: TransferContext) {}
public fun modifyAfterBytesTransferred(context: TransferContext): TransferContext = context

// File transfer hooks
public fun readBeforeFileTransferred(context: TransferContext) {}
public fun modifyBeforeFileTransferred(context: TransferContext): TransferContext = context
public fun readAfterFileTransferred(context: TransferContext) {}
public fun modifyAfterFileTransferred(context: TransferContext): TransferContext = context

// Transfer completion hooks
public fun readBeforeTransferCompleted(context: TransferContext) {}
public fun modifyBeforeTransferCompleted(context: TransferContext): TransferContext = context
public fun readAfterTransferCompleted(context: TransferContext) {}
public fun modifyAfterTransferCompleted(context: TransferContext): TransferContext = context
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why do all of these hooks have the same TransferContext parameter? Our other interceptors have unique context parameters per hook because we gradually accumulate more context over the lifetime of the request. For instance, readBeforeFileTransferred can never have an upload ID but readAfterFileTransferred can.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the items in the context nullable so we wouldn't have to create different context types. It's less complex, and interceptors can always just use ?..

Comment on lines 18 to 24
jvmTest {
dependencies {
implementation(libs.smithy.kotlin.test.jvm)
implementation(libs.smithy.kotlin.testing.jvm)
implementation(libs.smithy.kotlin.aws.signing.common)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness: We should not have JVM-only tests in most cases. S3TM must work across all targets and our test dependencies should be available on all targets.

@github-actions
Copy link

A new generated diff is ready to view.

  • No codegen difference in the AWS SDK

@github-actions
Copy link

A new generated diff is ready to view.

@github-actions
Copy link

A new generated diff is ready to view.

@lauzadis lauzadis mentioned this pull request Oct 30, 2025
1 task
@github-actions
Copy link

A new generated diff is ready to view.

@github-actions
Copy link

A new generated diff is ready to view.

smithy-kotlin-telemetry-provider-otel = { module = "aws.smithy.kotlin:telemetry-provider-otel", version.ref = "smithy-kotlin-runtime-version" }
smithy-kotlin-test-suite = { module = "aws.smithy.kotlin:test-suite", version.ref = "smithy-kotlin-runtime-version" }
smithy-kotlin-testing = { module = "aws.smithy.kotlin:testing", version.ref = "smithy-kotlin-runtime-version" }
smithy-kotlin-test-jvm = { module = "aws.smithy.kotlin:http-test-jvm", version.ref = "smithy-kotlin-runtime-version" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming: smithy-kotlin-http-test-jvm

blankLine()

imports += ImportDirective(operation.request.lowLevel.type, operation.request.lowLevelName)
imports += ImportDirective(operation.request.lowLevel.type, operation.request.lowLevelName) // TODO: Bookmarking so I can implement this myself
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this TODO mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a note to myself, I forgot to delete it.


description = "S3 Transfer Manager Code Generation"
extra["displayName"] = "AWS :: SDK :: Kotlin :: HLL :: S3 Transfer Manager Codegen"
extra["moduleName"] = "aws.sdk.kotlin.hll.s3transfermanager.codegen"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we decided on a package name yet? I think s3tm would be simpler, cleaner

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can go either way, not sure if Ian has an opinion.

/**
* High level S3 TM request/response from low level S3 operation members
*/
internal data class IOMapping(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming: IoMapping? same applies in other places you use IO

}
}

// This is copied from :hll:dynamodb-mapper:dynamodb-mapper. TODO: Commonize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI this doesn't work for native, I have a fix in my kn-merge-main branch. You'll probably need to refactor to take that change once I merge it

* The maximum amount of parts to buffer in memory while waiting for uploads to complete.
* The actual number of parts buffered at any given time may be less than or equal but never greater.
*
* Defaults to 5.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did we decide this number?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With our 8MB byte default part size that means up to 40MB could be used as a buffer. Given we don't know anything about the user's env and memory limitations, 40MB seems like not too much but not too little.

import kotlinx.coroutines.currentCoroutineContext
import kotlinx.coroutines.withContext

internal suspend fun uploadFileImplementation(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming: having the name "implementation" in a function implementation feels wrong

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do it all over the SDK and Smithy Kotlin, but we usually use the suffix impl.

val targetNumberOfParts = contentLength / targetPartSize
return if (targetNumberOfParts > MAX_NUMBER_PARTS) {
ceilDiv(contentLength, MAX_NUMBER_PARTS).also {
logger.warn { "Target part size is too small to meet the $MAX_NUMBER_PARTS S3 part limit. Increasing part size to $it" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we clarify with the spec author what level this should be logged at?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me ask

Copy link
Contributor Author

@0marperez 0marperez Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's nothing in the spec mentioning logging a message when the configured part size isn't used btw.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec author uses DEBUG

import aws.sdk.kotlin.services.s3.model.CreateMultipartUploadResponse

internal suspend fun initiateTransfer(
multiPartUpload: Boolean,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming: multipartUpload


// TODO: Setup e2e test environment - can't run these every build and in CI
class UploadFileTest {
@Test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are no longer @Ignore but they also aren't generic. Please refactor to make them runnable by anyone, or exclude them from the test suite

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should've been ignored before I posted the review, my bad.

@github-actions
Copy link

A new generated diff is ready to view.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-changelog Indicates that a changelog entry isn't required for a pull request. Use sparingly.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants